Other-regarding preferences and altruistic punishment: 
A Darwinian perspective 
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This article examines the effect of different other-regarding preference types on the emergence of 
altruistic punishment behavior from an evolutionary perspective. Our findings corroborate, com- 
plement, and interlink the experimental and theoretical literature that has shown the importance 
of other-regarding behavior in various decision settings. We find that a selfish variant of inequity 
aversion is sufficient to quantitatively explain the level of punishment observed in contemporary 
experiments: If disadvantageous inequity aversion is the predominant preference type, altruistic 
punishment emerges in our model to a level that precisely matches the empirical observations. We 
use a new approach that closely combines empirical results from a public goods experiment together 
with an evolutionary simulation model. Hereby we apply ideas from behavioral economics, complex 
system science, and evolutionary biology. 
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The convergence of individual behaviors to common 
norms and the punishment of norm violators is an of- 
ten observedpattern in groups of animals and human 
societies [3, U H, 0, S] ■ F rom small cliques, to the so- 
cial order in groups and tribes, all the way to the le- 
gal frameworks of countries, punishment is a widespread 
mechanism underlying the formation of common norms 
U 0- In particular, altruistic punishment, i.e., the 
punishment of norm violators at one's own cost without 
personal benefit, is frequent in social dilemmas and is 
often used to explain the high level of cooperation in hu- 



mans 



H 0, El 11 , ll] • Within standard economic theory, 
which relics on rational selfishness and the dominance of 
self-regarding preferences, such behaviors are puzzling, if 
not disrupting. This observation calls for the identifica- 
tion of the generative mcchanism(s) underlying altruistic 
punishment and how its occurrence may be context de- 
pendent. 

Laboratory experiments and field studies suggest 
that egalitarian motives and other-regarding preferences, 
which relate a person's decision to her social environ- 
ment, have a significant influence in social dilemmas and 
in bargaining 
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Several extensions of 
the standard economic approach provide descriptions of 
other-regarding preferences by postulating new terms in 
utility functions to account for relative income prefer- 
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enecs, envy, inequality aversion and altruism 
While these approaches are based on plausible assump- 
tions, their evolutionary validation remains vague and 
their quantitative coherence with empirical data unveri- 
fied. 

There is growing evidence from a variety of studies 
that pro-social preferences have emerged in hominids over 
hundreds and thousands of years, with deep roots go- 
ing further back as evidenced from recent studies on pri- 
mates 0, ED, 0, El 0|. The diversity of behavioral 



traits found in different human cultures may result from 
different evolutionary trajectories as well as distinct rel- 
ative influence of the cultural versus genetic heritages 
[1, El El- A composite picture is emerging, according 
to which the perception of fairness, the reaction to unfair 
behavior and the individual's response to its social envi- 
ronment in general, are encoded both in cultural norms 
and in genes [13, E| El El ED, E3, El . 

Cultural norms and genes appear to be subjected to 
complex coevolutionary processes occurring over a spec- 
trum of different time scales. Cultural evolution is 
shaped by biological conditions, while, simultaneously, 
genes are altered in response to the evolutionary forces 
induced by the cultural context. The co-evolutionary 
dynamics and inter-dependencies of genes and cultural 
norms constitutes our starting point to understand the 
properties of social preferences revealed in experimental 
economics, field studies and, of course, in real life. 

Experiments on public goods and social dilemma 
games provide convenient tools to study social prefer- 
ences in well-defined scenarios under controlled condi- 
tions. In particular when designed with the opportunity 
to punish other s ubj ects at own costs, altruistic behavior 
is manifested 0, [13, HH, El E| . In these experiments, 
one can study in details what controls the predisposition 
of humans to bear the costs associated with punishments 
of free riders, and how it may improve the welfare of 
the group. Even in one-shot interactions in public good 
games for which reputation and reciprocal effects are ab- 
sent, punishment, which is costly to the punisher and 
thus in contradiction with rational choice theory, is fre- 
quently observed 0, [uj, El • 

Here we develop computer simulations of synthetic 
agents within an agent-based model (ABM), that de- 
scribes the long-term co-evolution of norms and genes 
in populations being exposed to a typical public goods 
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dilemma. Our work can be viewed as an extension of 
the literature on ABM approaches to the evolution of 
cooperation [3^, Specifically, we set our ABM to 

compare with the results of two public goods game ex- 
periments conducted by Fehr and Gachter 0, [l(| ■ Our 
modeling strategy is to see the empirical observations 
in Fehr and Gachter's experiments as a snapshot within 
a long-term evolutionary dynamic. Our ABM mimics 
the norm-gene co-evolution that has occurred over hun- 
dreds and thousands of years. We calibrate our model 
by means of empirical data, to quantitatively identify the 
underlying preference types that drive the observed con- 
temporary behavior in the corresponding dilemmas. In 
doing so, our goal is to determine the conditions under 
which agents develop spontaneously a propensity to "al- 
truistically" punish, starting from an initial population of 
uncooperative non-punishers. Here, we specifically look 
into a set of common assumptions made by economists 
to account for altruistic punishment behavior within the 
framework of utility theory: Other-regarding preferences 
in form of inequality and inequity aversion. 

Initialized by variants of these other-regarding pref- 
erences, the traits of our agents converge to statistically 
stable distributions after long transients, which are taken 
to describe the present-day characteristics of modern hu- 
mans. In other words, the experiments of Fehr and 
Gachter (jj [n| are interpreted as sampling the statis- 
tically stationary characteristics of a cultural group of 
subjects 1 which have evolved over a long time horizon. 
Their response to specific social dilemma situations are 
then revealed through the present-day experiments. One 
should, however, keep in mind that other patterns of be- 
haviors may have emerged under different norms and ge- 
netic endowments. 



EMPIRICAL MOTIVATION 

The design of our ABM is inspired by the two public 
goods game experiments conducted by Fehr and Gachter 
@j flOj - In these experiments, subjects are arranged in 
groups of 71 = 4 persons. At the beginning of each pe- 
riod, subjects received an initial endowment of 20 mon- 
etary units (MUs). Thereafter, subjects could invest 
m G [0, 20] MUs to a common group project, which re- 
turned gi = 1.6 MUs for each invested MU. The total 
return from the project was then equally split and re- 
distributed to all group members. Thus, the marginal 
return per capita was gi/n = 0.4. As long as gi/n < 1, 
the game has a vivid social dilemma component and the 
setup is susceptible to defection through material self- 



interest, since it is rationally optimal not to cooperate, 
while the group is better off if each member cooperates. 
Hence one can consider the subjects' investment as their 
level of cooperation. 

In a second extended run, subjects were additionally 
provided with the opportunity to punish other group 
members, after they have received the project return and 
have been informed about the individual contributions. 
The use of punishment was associated with costs for both 
parties, in which each MU spent by a punisher led to 
r p = 3 MUs taken from the punished subject 10] 2 . The 
fact that punishment is costly and that the cost to the 
punished one is larger (r p > 1) arc important properties 
of the experimental design, which are thought to capture 
schematically many real life situations. Versions with and 
without punishment were played both in a partner treat- 
ment, in which the group composition did not change 
across periods, and in a stranger treatment. In the later, 
subjects were reassigned to new groups at each period 
and thus were only engaged in one-shot interactions dur- 
ing the entire runtime of the experiment. In total, the 
experiments were played for T\ = 10 0] and T2 = 6 pe- 
riods [lj| respectively 3 . 

The data from Fehr and Gaechter as well as from sev- 
eral other public goods experiments 35|, 3f| 37 1 show that 
people, if provided the opportunity, frequently punish de- 
fectors, even if this is costly to themselves. In the case of 
repeated interactions, as in the partner treatment, such 
behavior can be explained in a standard way as result- 
ing from a strategic optimization performed by rational 
selfish agents, who select the "direct reciprocity" mech- 
anism for cooperation. What is more surprising is that 
strangers continue to punish at a cost to themselves even 
in one-shot interactions for which there is no positive 
material gain, even in absence of mechanisms associated 
with direct, indirect reciprocity and reputation building. 
This behavior is referred to as "altruistic punishment" to 
emphasize the conflict with the behavior expected from 
purely rational agents. The question we address here is 
why humans behave in a way that strictly contradicts ra- 
tional choice, i.e., they continue to cooperate and punish 
at a cost to themselves. 



THE AGENT-BASED MODEL 

We extend the setup of the altruistic punishment game 
of Fehr and Gachter [10( to construct an ABM of a pop- 
ulation of agents who play a public goods game with 



1 Here undergraduate students from the Federal Institute of Tech- 
nology (ETH) and the University of Zurich 



2 In Q the punisher paid approximately 2 MUs to take an addi- 
tional 10% from the punished subject's period profit. 

3 To avoid the last round effect, we consider only data from periods 
1 — 9 and 1 — 5, respectively. 
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punishment, while adapting and evolving over long peri- 
ods according to generic evolutionary dynamics. On the 
short time scales of Fehr and Gachter's experiments, the 
traits of the human players probed by the games can be 
considered fixed for each player. In contrast, our evolu- 
tionary ABM aims at determining which superordinatc 
regime of other-regarding preferences have led our an- 
cestors to develop traits promoting altruistic punishment 
behavior. These traits might again be encoded in the cul- 
tural context, in genes, or both. Thus, we let the traits 
evolve over time according to standard evolutionary dy- 
namics: Adaptation, selection, crossover and mutation. 
In order to capture the possible evolution of the popula- 
tion, agents adapt and die when unfit. Newborn agents 
replace the dead ones, with traits taken from the pool 
of the other surviving agents. Starting from an initial 
population of agents with no cooperation and with no 
propensity to punish others, we will find the emergence 
of long-term stationary populations whose traits are in- 
terpreted to represent those probed by contemporary ex- 
periments, such as those of Fehr and Gachter. 

The results presented below correspond to groups of 
n = 4 agents as in the Fehr and Gachter's experiments. 
At the beginning of the simulation (time t = 0), each 
agent is endowed with ittj(0) = MUs 4 , which represents 
its (human and material) capital. Each agent (i) is char- 
acterized by two traits [rrii(t) : ki(t)}, her level of coop- 
eration and her propensity to punish, that are subjected 
to evolutionary forces. 

ABM algorithm 



4. Adaptation. Given her profit or loss (P&L) de- 
fined by equation[5], which results from the project 
return minus her contribution and minus the cost of 
punishments, an agent may adapt her cooperation 
level rrii{t) and her propensity to punish fcj(t). 

5. Evolution. An agent whose capital drops below 
dies and is replaced by another agent whose traits 
are inherited from the pool of survival agents with 
some additional mutation. 

These five steps are now described in details. 

Cooperation step 

As denoted before, the first trait of a given agent is 
her intrinsic willingness to cooperate denoted by nii(t). 
At each period t, the agent i contributes m,i(t) MUs to 
the group project. As in the lab experiment of Fehr and 
Gachter, each MU invested in the group project returns 
gi = 1.6 MUs to the group. Each agent receives the same 
return 

n 

K*) = (ffiA0-X>j(*) (!) 

3=1 

from the group project, resulting in a first-step profit or 
loss of 

n 

Si(t) = r(t) - m t {t) = [gx/n) -J^m^t) - m l {t) , (2) 

3=1 

for a given agent i equal to the difference between the 
project return and its contribution in period t. 



A given simulation period t is decomposed into five 
sub-periods with the following chronology. 

1. Cooperation. Each agent i chooses an amount to 
contribute to the group project referred to as her 
level of cooperation. Combining all the contribu- 
tions by all group members and splitting it equally, 
this leads to a return per agent given by equation 
©■ 

2. Punishment. Given the returns from the project 
and the agents' cooperation levels which are re- 
vealed to all, each agent may choose to punish the 
other agents according to the rule defined by equa- 
tion ©. 

3. Consumption. Each agent consumes the amount 
defined by equation (j4]). 



4 Only the transient behavior is sensitive to the choice of this initial 
wealth while the long-term results are independent of this initial 
value. 



Punishment step 

Punishments in our ABM follow the same design as 
in the experiments of Fehr and Gachter |, E3. to 
choose the agents' decision rules on when and how much 
to punish, we are guided by figure [T] which shows the 
mean expenditure of a given punisher as a function of 
the deviation between her contribution minus that of the 
punished individual, as reported empirically [Io| . One 
can observe an approximate proportionality between the 
amount spent for punishing the lesser contributing agent 
by the greater contributing agent and the pairwise dif- 
ference rrij(t) — rrii(t) of their contributions. The figure 
includes data from both the partner and the stranger 
treatments in the two sets of experiments 0, [l(| • In our 
ABM, this linear dependence, with threshold, is chosen 
to represent how an agent i decides to punish another 
agent j by spending an amount given by 



Pi->j{t) 



ki(t) ■ (mi(t) - rrij(t)) 




TOi(i) > m 3 (t) 
otherwise. 



(3) 
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pairwise deviation of the contribution levels 

for all group members 



of average group wealth (w(t)) over the last period: 
c(t) = Max[w(t - 1) - w(t - 2) ; 0] . 



(4) 



The proportionality between consumption and income 
captures the evidence that, for modern societies, con- 
sumption is indeed roughly proportional to income, at 
least for 99% of the population that exclude the super- 
wealthy [4l|, 42 1 . In primitive societies, larger human 



and material capital led also to larger consumption in 
the form of producing more offsprings 5 . The form of 
consumption (eq. [?]) is intended to capture the survival 
needs and selection pressure that are determined relative 
to the social group. 



Wealth balance 



FIG. 1: Mean expenditure of a given punishing member as 
a function of the deviation between her contribution minus 
that of the punished member, for all pairs of subjects within 
a group, as reported empirically [jp|. The straight line cross- 
ing zero shows the average decision rule for punishment that 
our agents spontaneously evolve to at long times. Its slope 
— k ~ —0.2 defines the average propensity k to punish (see 
the main text). The anomalous punishment of cooperators, 
corresponding to the positive range along the horizontal axis, 
is neglected in our model. 

The coefficient ki(t), which represents the propensity 
to punish, is the second trait that characterizes agent i at 
time t. It is not fixed to the average value k « 0.2 found 
in the experiments. It is allowed to vary from agent to 
agent and it evolves as a function of the successes and fail- 
ures experienced by each agent, as explained below. In 
addition to being motivated by the data shown in figure 
[TJ the punishment rule ^ can be thought of as a mini- 
malist description of proportional response to defection. 
We will see that, given certain fairness preferences are 
activated, evolution makes the punishment propensities 
ki(t) self-organize towards a distribution closely resem- 
bling the empirical data. 

As a result of being punished the wealth of the pun- 
ished agent j is reduced by the amount spend by agent 
i multiplied by the punishment impact factor r p . As in 
Fehr and Gachter's second experiment, we fix the pun- 
ishment impact factor to r p = 3. 



Consumption step 

Consumption is introduced as a realistic driving force 
to select for successful agents and remove unfit ones. At 
each period, consumption absorbs an amount c(t) of the 
capital of each agent. The amount is assumed to be de- 
termined by the social context, specifically as the change 



The total P&L §i(t) of an agent i over one period of 
her life is thus the sum of three components: (i) Her first 
step P&L Si(t) from the group project (equation ||2J)), (ii) 
the MUs Vi^j (t) spent to punish others and (iii) the 
punishments r p JZj^i Pj—>i(t) received from others, where 
Pi->j{t) and Pj->i(t) are given by ([3]): 

§i (t) = Si (t) - Pi^j (t) ~ r pYl p^i 0) • (5) 

Equation [S] represents the second step P&L of agent i in 
period t. Putting this all together, the wealth (fitness) 
of agent i thus increases or decreases at each period t 
according to 

Wi(t + 1) = Wi{t) + Si(t) - c(t) . (6) 

Adaptation Dynamics 

The traits [rrij(£); ki(t)\ characterizing each agent i at 
a given time t evolve with time according to standard 
evolutionary dynamics: Adaptation, selection, crossover 
and mutation. As has been argued (e.g. by Arthur [43[ 
or Holland [44T]). humans (and our ancestors) are likely to 
use inductive reasoning to make decisions. In particular, 
this means that humans tend to replace working hypothe- 
ses with new ones when the old ones cease to work. We 
adopt this bounded rational approach to define adapta- 
tion rules determining the propensity to punish and the 
level of cooperations chosen by agents. 

Adaptation of the propensity to punish. To identify 
which type of preference norm drive the evolution of 



5 While we account for this larger proportional consumption, we 
do not include a population dynamics, as our model assumes 
a constant group size n, with each death being followed by a 
corresponding birth. 
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the trait ki(t) associated with the propensity to pun- 
ish to a level, that is observed in the experiments, we 
test a distinct set of adaption rules. Each adaption rule 
corresponds to a specific subset of other-regarding pref- 
erence relations. Here, we specifically focus on differ- 
ent variants of inequality and inequity aversion prefer- 
ences and therefore consider the five following types of 
agents: (A) self-regarding, (B) inequality averse, (C) in- 
equity averse, (D) disadvantageous inequality averse and 
(E) disadvantageous inequity averse. Here "disadvanta- 
geous" indicates that agents arc only inequality/inequity 
averse if the inequality/inequity plays to their disadvan- 
tage, whereas pure inequality or inequity avers agents 
dislike both, situations in which they have been discrim- 
inated as well as situations in which they discriminated 
others. 

In each given simulation, we use only homogeneous 
populations, that is, we group only agents of the same 
type. 

A: self-regarding agents: In this universe, an agent 
updates her propensity to punish only if her P&L 
§i(t) given by ([5]) obtained on the previous invest- 
ment period turns out to be smaller than her re- 
quired consumption c(t). The update consists in 
an unbiased random increment according to 6 

ki(t + 1) = h(t) + K[- . 005, 0.005] , (7) 

where K is a uniformly distributed random number 
drawn from the interval indicated in the subscript. 
Only draws of re's that ensure positiveness of ki(t + 
1) are allowed. 

B: inequality averse agents: In this universe, an 
agent i updates her propensity to punish if her P&L 
Si(t) given by is not within a specific tolerance 
range [—1, +1} around the average P&L of the other 
members of her group, i.e. if (§i(t) < s(t) — I) 
or (§i(t) > s(t) + I). When this occurs agent i 
updates her ki (t) according to equation . We run 
multiple simulations initialized by different values 
for I as presented in the results section. 

C: inequity averse agents: In this universe, agents 
set their P&L in relation to their contributions: An 
agent i updates her propensity to punish according 
to eq. 0, if... 

(upside inequity) ...she has contributed less than or 
equally to her group fellows (rrii(t) < m(i)), where 
the average fh(t) is performed over the contribu- 
tions of the other members of her group and, at the 



same time, has received a total P&L Si(t) defined 
in ([5]) larger than or equal to the group average 
(si{t) > s(t)), where the average s(t) is performed 
over the other group members)... 

(downside inequity) ...or she has contributed more 
than or equally to her group fellows (mj(t) > m(t)) 
and, at the same time, has received a total P&L less 
than or equal to the group average (s;(i) < s(t)). 

D: disadvantageous inequality averse agents: In 

this universe, agents only dislike situations in which 
the inequality is to their's disadvantage: 

(downside inequality) An agent i updates her 
propensity to punish only if her P&L Sj(t) given by 
([5]) is smaller than the average P&L of the other 
members of her group, i.e. (§i(t) < s(t)). When 
this occurs for an agent i, she updates her ki(t) 
according to equation . 

E: disadvantageous inequity averse agents: Like- 
wise to setup (C), agents set their P&L in relation 
to their contributions, however they only dislike sit- 
uation in which the inequity is detrimental to them. 

(downside inequity) If an agent i has contributed 
equally or more than her fellows in the group 
(nii(t) > fh(t)) and, at the same time, has received 
a total P&L §i(t) defined in smaller than or 
equal to the group average (Si(t) < s(t)), then she 
updates her propensity to punish according to eq. 
0. 

Adaptation of the cooperation level. Following the 
proverb "necessity is the mother of all invention," agents 
adapt their cooperation level m, (t) under adverse condi- 
tions, i.e., when their P&L in the last investment round 
is smaller than the required consumption c(t). When 
this adverse situation occurs, an agent randomly up- 
dates her contribution according to m,(i + 1) = rrii(t) + 
e [-o.oo5,o.oo5]j where e is a random number uniformly dis- 
tributed in the interval indicated in the subscript. 7 Since 
contributions are non-negative, only those draws of e arc 
kept that ensure the positiveness of mj(i + 1). 

Replicator Dynamics: Selection, crossover and 
mutation 

In addition to the adaptation of the agents' traits 
[rrii(t); ki(t)] described above, evolution occurs by replac- 
ing under-performing agents. When an agent's wealth 
Wi(t) drops below zero, the agent dies and is replaced by 



Our results are robust to changes of the width of the interval, as 
long as it remains symmetric around zero. 



7 Our results are robust to changes of the width of this interval as 
long as it remains symmetrical. 
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a new one with different traits [rrii(t + 1), ki(t + 1)], de- 
termined by those of the surviving agents of the group. 
The following variants give essentially the same results. 

• To capture the phenomenon that norms and genes 
are inherited from the pool of existing agents, a 
first variant assumes that the new agent has her 
traits simply equal to the average of those of the 
surviving agents. 

• A second variant consists in assuming a noisy in- 
heritance of the average traits of the group mem- 
bers. Specifically, the new agent has traits equal to 
the average of those of the surviving agents plus a 
noise proportional to the standard deviation of the 
surviving agents' traits. 

• A third variant first determines clusters among 
the agents, including the newborn, using a stan- 
dard clustering algorithm on the vectors of the two 
traits. The existence of several clusters is taken 
to account for the possible emergence of heteroge- 
neous sets of norms within the group. Then, the 
newborn takes the average traits of the agents in 
her cluster, decorated by an additional mutation 
implemented by adding a random number to each 
trait proportional to the standard deviation of the 
surviving agents' traits. 

THE EMERGENCE OF ALTRUISTIC 
PUNISHMENT 

We have run our ABM with thousands of independent 
groups of n = 4 agents over one million simulation pe- 
riods t. The agents have a double peak distribution of 
lifetimes, with a mean of the order of 1.7 • 10 4 periods 
and a median of the order of 1.5 ■ 10 4 periods 8 . This 
expresses that, while many agents die at an early age, 
also many survive beyond their "childhood" , enabling 
them to adapt their traits. Each run thus corresponds 
to several hundreds of generations. Each simulation has 
been initialized with all agents being uncooperative non- 
punishcrs, i.e., fe;(0) = and mi(0) = for all i's. Af- 
ter a long transient, we observe that the distribution 
of propensities to punish converges to different station- 
ary functions depending on which adaptation dynamics 
(A,B,C,D or E) for k is active. 

In the following we give a qualitative analysis of 
our simulation results. Selfish agents adapting their 
propensity to punish according to dynamics (A) remain 
weak punishers, and no significant "altruistic punish- 
ment" is observed as shown in the inset of figure [2] 



In contrast, for agents endowed with inequality or in- 
equity aversion (adaptation rules B to E), stationary 
states of the propensity to punish emerge spontaneously, 
each with different characteristics. For all adaptation 
rules (B to E) it holds, that altruistic punishment has 
emerged endogenously as an evolutionary stable trait in 
the competitive resource-limited world described by our 
model. The responsible key ingredients are the variants 
of other-regarding preferences (B,C,D or E). It should 
be stressed that a symmetric (upside and downside) in- 
equity/inequality aversion is not needed as a condition to 
let altruistic punishment emerge. The selfish disadvan- 
tageous inequality or inequity aversion (dynamics D and 
E) is sufficient. 

We now turn to a quantitative characterization of the 
properties of the altruistic cooperators that evolved in 
our ABM. Figure [5] compares the stationary distributions 
of the propensities to punish obtained with our ABM at 
long times for the four adaptation dynamics (B to E) 
with that obtained in Fehr and Gachter's experiment 9 . 
The propensities to punish in the experimental data have 
been inferred as follows: Knowing the contributions m, > 
nij of two subjects i and j and the punishment level 
Pi—>j of subject i on subject j, the propensity to punish 
characterizing subject i is determined by 



Applying this recipe to all pairs of subjects in a given 
group, we obtain twelve measures of propensities to pun- 
ish per group. We then take the average over these twelve 
values to obtain a single robust estimation associated 
with a given group. Sampling all groups and all periods, 
we obtain the distribution shown in figure [2] (continuous 
thick line (emp)). 

Figure [2] suggests that the adaptation dynamics E (dis- 
advantageous inequity averse agents) is the only one able 
to fit the empirical distribution. Indeed, the Kolmogorov- 
Smirnov test cannot reject the hypothesis that the k val- 
ues evolved in our ABM world for the adaptation dynam- 
ics E and the empirical values are drawn from the same 
distribution at the 80% confidence level (p- value ~ 0.2). 
In contrast, the other adaptation dynamics A to D are 
strongly rejected (p- value = 0). Given the simplicity of 
our ABM and of its underlying assumptions, it is strik- 
ing to find such detailed quantitative agreement for one 
of our dynamics. 

For selfish agents (dynamics A), we found that the 
weak level of punishment that evolved in the ABM is 
entirely controlled by selection pressure, and thus the 
presence of consumption. In particular, in the absence of 
any selection pressure, the level of altruistic punishment 



These lifetimes correspond to a population of disadvantageous 
inequity avers agents (E) 



We used the data from the [si lldll stranger treatments. 
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FIG. 2: Distributions of propensities to punish in Fehr and 
Gachter's experiments (emp) and in our ABM at long times 
under the five different adaptation dynamics (A to E) under 
800 system realizations. The inset shows the smoothed dis- 
tribution for the self-regarding agent population (adaptation 
dynamics A), while the main frame plots the smoothed dis- 
tributions obtained by the adaptation rules B to E as well 
as for the empirical data (emp). All distributions has been 
smoothed by a standard kernel method, (emp): Distribution 
estimated from Fehr and Gachter's experiments using the pro- 
cedure explained in the text; the other distribution estimates 
correspond to the adaptation dynamic A to E. For adapta- 
tion dynamic (B) the plot shows the obtained distributions 
for all tolerance range parameters / G 0.1; 0.075; 0.05; 0.025; 
(from the mode close to the origin (I = 0.1) to the mode far- 
thest (2 = 0). The parameters of our ABM simulations are: 
n = 4,#i = 1.6, r-p = 3. 



drops to zero. In all other cases, it remains too small to 
explain the empirical results of Fehr and Gachtcr. For 
the inequality averse population (B) it become obvious, 
that within a symmetric variation of the tolerance range 
parameter I the empirical distribution can not be repro- 
duces: Figure [2] shows the resulting distributions for the 
tested values I e [0; 0.025; 0.05; 0.075; 0.1] as thin dashed 
lines, with I = 0.1 corresponding to the mode close to 
the origin going stepwise to I = with the mode farthest 
from the origin. For inequity avers agents (dynamics C) 
and disadvantageous inequality averse agents (dynamics 
D), we find levels of altruistic punishments that far ex- 
ceed the empirical evidence. Our quantitative compar- 
ison with Fehr and Gachter's experiments supports the 
hypothesis that human subjects are well-described as be- 
ing disadvantageous inequity averse, corroborating and 
complementing previous evidence [13, [3, El- The re- 
sults obtained with our ABM simulations suggest that 
the co-evolution of norms and genes promoting altruistic 
punishment have been influenced by disadvantageous in- 



equity aversion in the presence of simple inductive strate- 
gies, leading to self-sustained co-evolving traits made ro- 
bust by reinforcing feedbacks. 

The distribution of propensities fc» to punish exhibits a 
mode around k = 0.2, which means that most punishers 
spend an amount approximately equal to one-fifth of the 
experienced differences in contributions. Note that the 
value of the mode around k = 0.2 is close to the slope of 
the straight line fitting the empirical data shown in fig- 
ure [1] providing another confirmation of the explanatory 
power of our ABM. This most probable value k = 0.2 
has also been obtained analytically by assuming an evo- 
lutionary optimization of the expected gains with respect 
to potential future losses due to punishment j34[. 

THE EFFECT OF THE PROPENSITY TO 
PUNISH ON COOPERATION 

We now demonstrate with our model that punishment 
is a key stabilization mechanism for sustaining coopera- 
tion. For this, we need some destabilizing process that 
tends to destroy cooperation in the absence of punish- 
ment. The experiments of Fehr and Gachter suggest 
such a mechanism. A detailed analysis of the period- 
by-period decision outcomes made by human subjects 
shows evidence of short-term persistence in their updates 
of cooperation levels: Previous changes m,i(t — 2) — > 
rrii(t — 1) of contributions that led to larger return from 
the project (sj(i — 1) > &j(t — 2)) are followed by subse- 
quent updates rrii(t — 1) — > n%i(t), with the same trend: 
[m i (t)-m i (t-l)]-[m i (t-l)-mi(t-2)} > 0). We refer 
to this behavior as using a "trend-following strategy." 

When we add the trend-following strategy to our 4- 
step algorithm as described above but, in the absence of 
punishment (all fc's are imposed equal to 0), we find that 
cooperation that was maintained previously in the pres- 
ence of punishment decays after a few thousand periods 
as shown in figure [3] In contrast, if punishment is re- 
stored in the presence of the destabilizing trend-following 
strategy, cooperation remains stable. Note also that the 
emergence of the stable distribution of propensity to pun- 
ish reported in figure [2] is robust to the addition of the 
trend-following strategies used by the agents. 

Figure 0] shows the average level of cooperation in a 
group of 4 agents after a transient period of 20,000 sim- 
ulation periods for 800 system realizations as a function 
of the propensity k to punish, when the level of cooper- 
ation for all agents is initially drawn from a uniformly 
distributed random variable in {49,51}. It reveals that 
the level of cooperation undergoes a bifurcation from zero 
to significant levels of cooperation, for a value of k close 
to the mode of the empirical distribution of the propen- 
sities to punish. This suggests that evolution may have 
selected an "optimal" propensity to punish altruistically 
defectors in order to sustain cooperation. 
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FIG. 3: Average group contribution for a group of 4 agents 
with punishment (k = 0.2 - continuous line) and without (k = 
- dashed line) over 20,000 simulation periods and 16 system 
realizations. The initial contribution nii(0) for all agents i 
of a group is randomly drawn form a uniform distribution in 
{49,51}. 
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FIG. 5: Average punishment spent to punish defectors for a 
group of 4 agents as a function of k after an equilibrium time 
of 20000 simulation periods and 800 system realizations, k is 
fixed to the corresponding value on the x-axis and the initial 
contribution m<(0) in period for all agents i of a group is 
randomly drawn form a uniform distribution in {49, 51}. 
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formity, respectively. This responsive behavior has been 
manifested in many empirical observations 46|, 47 1. 

Given, that disadvantageous inequity aversion is the 
dominant preference type, cooperation and norm confor- 
mity can be maintained at lower costs if the propensity 
to punish, i.e. the level of deterrence, exceeds the tip- 
ping point of k at ~ 0.2. This again substantiates, that 
evolution may have selected an "optimal" propensity to 
punish to sustain cooperation and prevent defection in 
contexts where people behave disadvantageous inequity 
averse. Comparable results have been obtain using a dif- 
ferent simulation model as has been reported in [45| . 



FIG. 4: Average group contribution for a group of 4 agents 
as a function of k after an equilibrium time of 20000 simu- 
lation periods and 800 system realizations, k is fixed to the 
corresponding value on the x-axis and the initial contribution 
mi(0) for all agents i of a group is randomly drawn form a 
uniform distribution in {49, 51}. 



To corroborate this hypothesis, figure O plots the av- 
erage amount of MUs spent to punish a defector during 
10000 simulation periods for 3200 system realizations as 
a function of the propensity to punish k. As in the setup 
of figure |H the level of cooperation rrii (t) for all agents 
is initialized at period t = by a random variable uni- 
formly distributed in {49,50}. We now consider the in- 
trinsic propensity to punish k as the potential exposure 
of defectors to being punished. The results show clearly, 
that a higher deterrence, i.e. a higher value of k, effec- 
tively causes less exertion of costly punishment in order 
to maintain a certain level of cooperation and norm con- 



CONCLUSION 

Our first principal result is that a high level of altru- 
istic punishment behavior emerges spontaneously from 
a population of agents who are initially uncooperative 
and non-punishers. We have shown how this results from 
evolution with adaptation, selection, crossover and mu- 
tation, in a population of agents endowed with different 
variants of inequality or inequity aversion. We stress that 
our use of the terms "inequality" or "inequity aversion" 
does not mean that a new term is added in the agents' 
utility function that controls their decisions; in contrast, 
we only assume that agents may punish as a reaction to 
differences in observed contributions to a group project. 
Thus, a key ingredient is the possibility for agents to pun- 
ish, at a cost to themselves, and that the punishment be 
efficient (in the sense r p > 1). As a consequence of the co- 
evolution of traits characterizing agents' cooperation and 
punishment, we obtain the emergence of altruistic pun- 
ishment behavior. A fully symmetric (upside and down- 
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side) inequity/inequality aversion is not needed to obtain 
our results, with the selfish disadvantageous inequality or 
inequity aversion being sufficient. 

Our second main result is the identification of disad- 
vantageous inequity aversion as the most relevant under- 
lying mechanism to explain the emergence and the degree 
of altruistic punishment observed in public goods exper- 
iments. This result has been obtained by combining em- 
pirical data with an evolutionary agent-based simulation 
model in an innovative way. Our ABM is able to repro- 
duce quantitatively, without adjustable parameters, the 
experimental results concerning the most likely levels of 
punishment behavior, as well as their full distribution. 
This result is of particular importance to substantiate the 
assumptions made by economists in order to describe re- 
alistic behavior within the framework of rational choice: 
Humans exhibit other-regarding, and in particular, dis- 
advantageous inequity aversion preferences in their deci- 
sion process. 

In conclusion, we believe that the combination of em- 
pirical research and agent-based modeling as done here 
can provide deeper insights into the apparently non- 
rational behavior of humans. For instance, with regard 
to the often-cited importance of altruistic punishment 
in promoting cooperation, our ABM provides a flexible 
and powerful methodology to answer many remaining re- 
search questions, including the influence of group interac- 
tions, competitions between variants of other-regarding 
preferences or the interplay of other mechanisms, as well 
as more realistic set-ups in which agents are playing sev- 
eral games simultaneously so as to mimic a real life situ- 
ation in which cognitive abilities and human capital are 
scarce resources. 

We are grateful to E. Fehr and S. Gachter for shar- 
ing their unpublished data with us and to U. Hoffragc 
for constructive remarks on the manuscript. The work 
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